from pathlib import Pathimport pandas as pdimport geopandas as gpdimport numpy as npimport matplotlib.pyplot as pltimport osmnx as ox%matplotlib inlineBASE_DIR = Path(".").resolve()DATA_DIR = BASE_DIR /"Data"PROCESSED_DIR = DATA_DIR /"processed"print("Base directory:", BASE_DIR)print("Processed directory:", PROCESSED_DIR)
1. Preliminary Study
1.1 Load processed data
This section loads the cleaned block group and station datasets produced in the data cleaning step, and checks basic properties such as the number of rows and the coordinate reference system.
To assess access to bike-share stations, we first calculated the projected distance from each census block group to its nearest active Indego station. We generated centroid geometries for all block groups and then performed a spatial nearest-neighbor join (sjoin_nearest) between the block group centroids and the projected Indego station locations (stations_proj).
The resulting distance values were then attached back to the block group dataset (bg_proj). We summarized the distribution of these distances using .describe(), producing our first accessibility indicator: the straight-line distance from each block group centroid to its nearest station, measured in feet.
plt.figure()bg["dist_to_station_ft"].dropna().plot(kind="hist", bins=30)plt.xlabel("Distance to nearest station (feet)")plt.ylabel("Number of block groups")plt.title("Distribution of distance to nearest Indego station")plt.show()
We then plotted a histogram of the distance values to examine how far block groups were from their nearest Indego station. The distribution showed that a large share of block groups were located relatively close to a station, forming a clear peak at shorter distances. However, the long right tail of the distribution indicated that many block groups particularly those outside the current Indego service area were much farther away.
This pattern reflects the spatial concentration of Indego stations in central Philadelphia: areas close to Center City exhibit very short station distances, while neighborhoods in the outer parts of the city experience substantially reduced access. Overall, the histogram highlighted a noticeable accessibility gap between centrally located and peripheral block groups.
Show code
import foliumimport branca.colormap as cmfrom branca.element import Template, MacroElementbg_proj = bg.copy()bg_proj["centroid"] = bg_proj.geometry.centroidbg_proj = bg_proj.set_geometry("centroid")bg_poly_wgs = bg.to_crs(epsg=4326).copy()bg_centroid_wgs = bg_proj.to_crs(epsg=4326).copy()bg_poly_wgs["dist_to_station_ft"] = bg["dist_to_station_ft"]center_lat = bg_centroid_wgs.geometry.y.mean()center_lon = bg_centroid_wgs.geometry.x.mean()m_dist_poly = folium.Map( location=[center_lat, center_lon], zoom_start=11, tiles="CartoDB Positron")dist_min = bg_poly_wgs["dist_to_station_ft"].min()dist_max = bg_poly_wgs["dist_to_station_ft"].max()orig_colors = cm.linear.PuRd_09.colorsrev_colors =list(reversed(orig_colors))colormap = cm.LinearColormap(rev_colors, vmin=dist_min, vmax=dist_max)def style_fn(feature): d = feature["properties"]["dist_to_station_ft"]return {"fillColor": colormap(d) if d isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.8, }folium.GeoJson( bg_poly_wgs, style_function=style_fn, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "dist_to_station_ft"], aliases=["GEOID", "Distance to station (ft)"], localize=True )).add_to(m_dist_poly)colormap.caption ="Distance to nearest station (ft)"colormap.add_to(m_dist_poly)legend_html ="""{% macro html(this, kwargs) %}<div style=" position: fixed; bottom: 20px; left: 20px; z-index: 9999; background-color: white; padding: 12px 14px; border: 1px solid #ccc; border-radius: 6px; box-shadow: 0 0 10px rgba(0,0,0,0.15); font-size: 13px; line-height: 1.4;"><b>Distance to Indego Station</b><br>Darker areas indicate <b>shorter<br>distance</b> to the nearest station.<br>Lighter areas represent block<br>groups farther away (in feet).</div>{% endmacro %}"""legend = MacroElement()legend._template = Template(legend_html)m_dist_poly.get_root().add_child(legend)m_dist_poly
Make this Notebook Trusted to load map: File -> Trust Notebook
We also created an interactive map to visualize block-level accessibility to Indego stations. Each block group was shaded according to its straight-line distance to the nearest active station, allowing us to explore spatial patterns in access across Philadelphia. Darker shades represented higher accessibility, meaning the block group was located closer to a station, while lighter areas indicated neighborhoods that faced greater distances and potentially reduced access.
Hovering over a block group revealed a pop-up containing its exact distance value in feet, enabling more precise comparison across locations. This visualization made it clear that most stations were concentrated in central Philadelphia, whereas block groups farther from the urban core showed markedly lower accessibility. The map therefore provided an intuitive overview of the spatial disparities in station placement and helped identify areas that may warrant future expansion.
1.3 Station counts within 800 meters
To capture local station density, we computed an 800 meter buffer around each block group centroid. Because our projected CRS (EPSG:2272) measures distance in feet, the buffer distance was converted from meters to feet before generating the buffer geometries. We then performed spatial joins between these centroid buffers and the Indego station point dataset.
After identifying all stations that fell within each buffer, we grouped the results by block group and counted the number of stations contained within the walkable 800 meter zone. These counts served as an additional accessibility indicator, representing how many stations were located within a reasonable walking distance of each block group.
plt.figure()bg["stations_800m"].plot(kind="hist", bins=20)plt.xlabel("Number of stations within 800 m")plt.ylabel("Number of block groups")plt.title("Station counts within 800 m of block group centroids")plt.show()
The histogram of station counts within the 800 meter buffer showed a strongly right-skewed distribution. Most block groups contained zero or one station within walking distance, indicating relatively limited access for much of the city. A smaller number of block groups had multiple stations within the buffer (in central area), reflecting much higher levels of walkable station availability. This distribution underscored the spatial imbalance in station density across neighborhoods.
Make this Notebook Trusted to load map: File -> Trust Notebook
We then created an interactive map to visualize the number of stations located within each block group’s 800 meter walking buffer. Darker colors represented block groups with higher station availability, while lighter shades indicated areas with fewer or no stations nearby.
The spatial pattern clearly showed that central Philadelphia had the greatest walkable access to Indego stations, with many block groups containing several stations within relatively short distances. In contrast, block groups in West, North, and far South Philadelphia appeared much lighter, revealing limited access and highlighting potential areas for future bike-share expansion.
1.4 Proximity to the bicycle network
To capture the structure of the city’s bicycle infrastructure, we used OSMnx to extract a bike-specific street network from OpenStreetMap. This network provided a more realistic representation of cycling mobility than simple straight-line measurements and allowed us to assess the spatial distribution of bike infrastructure across Philadelphia.
We then computed the distance from each block group centroid to the nearest bike-network edge. Using a nearest-neighbor spatial join (sjoin_nearest) between the centroid geometries and the bike-edge dataset, we recorded the distance as dist_to_bikelane_ft. For block groups where multiple bike edges intersected the same area, we grouped the results by block group index and retained the minimum distance.
This process produced our second key accessibility metric: each block group’s proximity to the nearest bike lane, measured in feet.
Show code
plt.figure()bg["dist_to_bikelane_ft"].dropna().plot(kind="hist", bins=30)plt.xlabel("Distance to nearest bike facility (feet)")plt.ylabel("Number of block groups")plt.title("Distribution of distance to the bicycle network")plt.show()
We plotted a histogram of bike-lane distances to examine how cycling accessibility varied across the city. The distribution revealed that many block groups were located relatively close to the bike network, forming a clear peak at shorter distances. However, a long right tail indicated that numerous block groups outside of central core were much more distant from designated bike facilities. This demonstrated substantial variation in infrastructure access across neighborhoods.
Show code
import foliumimport branca.colormap as cmbg_proj = bg.copy()bg_proj["centroid"] = bg_proj.geometry.centroidbg_proj = bg_proj.set_geometry("centroid")bg_poly_wgs = bg.to_crs(epsg=4326).copy()bg_centroid_wgs = bg_proj.to_crs(epsg=4326).copy()bg_poly_wgs["dist_to_bikelane_ft"] = bg["dist_to_bikelane_ft"]center_lat = bg_centroid_wgs.geometry.y.mean()center_lon = bg_centroid_wgs.geometry.x.mean()m_bike_poly = folium.Map( location=[center_lat, center_lon], zoom_start=11, tiles="CartoDB Positron")bike_min = bg_poly_wgs["dist_to_bikelane_ft"].min()bike_max = bg_poly_wgs["dist_to_bikelane_ft"].max()orig_colors = cm.linear.YlGnBu_09.colorsrev_colors =list(reversed(orig_colors))bike_cmap = cm.LinearColormap(rev_colors, vmin=bike_min, vmax=bike_max)def style_bike(feature): v = feature["properties"]["dist_to_bikelane_ft"]return {"fillColor": bike_cmap(v) if v isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.85, }folium.GeoJson( bg_poly_wgs, style_function=style_bike, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "dist_to_bikelane_ft"], aliases=["GEOID", "Distance to bike facility (ft)"], localize=True )).add_to(m_bike_poly)# Add the colorbarbike_cmap.caption ="Distance to nearest bike facility (ft)"bike_cmap.add_to(m_bike_poly)# Add descriptive legend boxfrom branca.element import Template, MacroElementlegend_html ="""{% macro html(this, kwargs) %}<div style=" position: fixed; bottom: 20px; left: 20px; z-index: 9999; background-color: white; padding: 12px 14px; border: 1px solid #ccc; border-radius: 6px; box-shadow: 0 0 10px rgba(0,0,0,0.15); font-size: 13px; line-height: 1.4;"><b>Distance to Bike Network</b><br>Darker areas indicate <b>shorter distance</b><br>(closer to a bike lane).<br>Lighter areas are farther away.</div>{% endmacro %}"""legend = MacroElement()legend._template = Template(legend_html)m_bike_poly.get_root().add_child(legend)m_bike_poly
Make this Notebook Trusted to load map: File -> Trust Notebook
We visualized the distance-to-bike-lane metric through an interactive map. Darker shading represented block groups that were closer to the bike network, indicating stronger bicycle infrastructure access, while lighter shading marked areas farther from designated bike facilities.
The map showed that most block groups across Philadelphia were relatively close to at least one bike facility, with large portions of the city appearing in darker blue tones. However, a small number of block groups on the outer edges of the city, particularly in far South Philadelphia and parts of the Northeast, stood out in much lighter colors. These pockets highlighted locations where residents were substantially farther from the bike network and where future connections or extensions could be most impactful.
2. Need Score
We first computed transportation-related social needs at the census block group level using ACS 2023 variables. Total population, median household income, zero-vehicle households, and bicycle commuters served as indicators of transportation need. Median household income was used as a proxy for economic vulnerability, while zero-vehicle households and bicycle commuters were treated as indicators of dependence on non-automobile modes.
All components were converted to numeric form and rescaled to a 0–1 range using the custom minmax() function. Because lower incomes represent higher need, the income component was inverted after scaling. The three standardized components—income (inverted), zero-vehicle households, and bicycle commuters—were then averaged to create a single need_score for each block group.
The need score combines three indicators at the block group level:
Median household income. Lower income suggests greater need for affordable transport options.
Zero vehicle households as a share of all households. More households without cars indicates stronger dependence on shared mobility.
Bicycle commuters as a share of the population. A higher bicycle commuting share suggests stronger reliance on bicycle friendly conditions.
All three indicators are scaled to a zero to one range before being combined.
plt.figure()bg["need_score"].dropna().plot(kind="hist", bins=30)plt.xlabel("Need score")plt.ylabel("Number of block groups")plt.title("Distribution of need score")plt.show()
We then examined the distribution of the need score using a histogram. The distribution showed a concentration of block groups in the mid-to-high range of need, with fewer block groups exhibiting extremely low scores. This pattern suggested that transportation-related need was widespread across the city, though certain neighborhoods demonstrated substantially higher levels of vulnerability than others.
Make this Notebook Trusted to load map: File -> Trust Notebook
We also mapped the spatial pattern of the need score across Philadelphia. Darker shades represented block groups with higher levels of transportation need.
The spatial distribution revealed that higher-need areas were concentrated in North Philadelphia, West Philadelphia, and portions of the Lower Northeast—neighborhoods that often exhibit lower incomes and greater reliance on non-automobile travel modes. In contrast, Center City and several higher-income areas displayed lighter need scores, reflecting lower levels of socioeconomic vulnerability. This map provided a clear geographic picture of where bike infrastructure investment may offer the greatest social benefit.
3. Access Score
To synthesize multiple accessibility indicators into a single measure, we defined three core components: (1) the number of Indego stations within an 800 meter buffer (stations_within_800m), (2) the straight-line distance to the nearest station (dist_to_station_ft), and (3) the distance to the nearest bike lane (dist_to_bikelane_ft).
All three variables were first coerced to numeric form. Each component was then scaled to a 0–1 range using the minmax() function. For the two distance-based indicators, we multiplied values by −1 prior to scaling so that shorter distances corresponded to higher access levels. This produced three standardized components that reflected station density, proximity to stations, and proximity to bike lanes.
The access score combines three indicators:
Number of stations within eight hundred meters of each block group centroid.
Distance to the nearest Indego station.
Distance to the nearest bicycle facility.
Station counts increase access, while shorter distances to stations and bicycle facilities indicate better access. Distances are inverted before scaling so that higher values always represent better access.
The final access_score for each block group was calculated as the mean of the three standardized components. We summarized the distribution of the resulting score using descriptive statistics and a histogram to understand variation in access across the city.
Show code
plt.figure()bg["access_score"].dropna().plot(kind="hist", bins=30)plt.xlabel("Access score")plt.ylabel("Number of block groups")plt.title("Distribution of access score")plt.show()
The histogram of access scores showed a strong skew toward lower and moderate access levels, with relatively few block groups achieving high overall accessibility. This indicated that while a small portion of the city benefited from strong multimodal access, most neighborhoods exhibited substantially lower levels of access to bike-share stations and bike infrastructure.
Make this Notebook Trusted to load map: File -> Trust Notebook
The map of access scores visualized the spatial distribution of bicycle infrastructure accessibility across Philadelphia. Darker areas represented block groups with higher access, while lighter areas indicated lower levels of access.
The spatial pattern showed that accessibility was overwhelmingly concentrated in central Philadelphia, where station density is highest and bike lanes are more extensive. Surrounding neighborhoods in West, North, and Northeast Philadelphia appeared significantly lighter, reflecting limited proximity to both stations and the bike network. These disparities highlighted the uneven geography of bicycle infrastructure access and pointed to clear areas where system expansion could improve regional mobility.
4. Equity Gap
To evaluate equity directly, we compared the need_score and access_score for each census block group. We first created a scatterplot of need versus access to illustrate the overall relationship between social need and bicycle infrastructure access across Philadelphia. A 45-degree reference line was added to represent perfect alignment between need and access.
We then computed an equity gap measure defined as:
equity_gap = need_score − access_score
Positive values indicated block groups where measured need exceeded access, suggesting potential underserved areas. Negative values represented block groups where access was relatively stronger than estimated need.
To facilitate comparison across the city, we divided the equity gap values into quartiles using qcut, assigning each block group to one of four categories: “Lowest gap,” “Low–medium,” “Medium–high,” and “Highest gap.” These quartile categories provided a simple typology for mapping and prioritizing neighborhoods based on their relative equity gaps.
plt.figure()bg["equity_gap"].dropna().plot(kind="hist", bins=30)plt.xlabel("Equity gap (need minus access)")plt.ylabel("Number of block groups")plt.title("Distribution of equity gap")plt.show()
The histogram of equity gap values showed a distribution centered around moderately negative scores, with most block groups falling between approximately −0.45 and −0.20. This pattern indicated that, for much of the city, infrastructure access exceeded measured social need to some degree. A smaller number of block groups exhibited strongly negative values, while very few approached zero, meaning that only a limited set of neighborhoods had need levels that nearly matched their access levels. Overall, the distribution suggested a broad mismatch between access and need, with underserved areas emerging primarily where equity gap values were closer to zero rather than in the strongly negative range.
The scatterplot comparing need and access scores provided a clear visual summary of equity conditions across Philadelphia. Each point represented a block group, colored by its equity gap value. Most points fell below the 45-degree reference line, indicating that access was generally higher than measured need throughout much of the city.
Block groups shaded in red were located nearer to the reference line, meaning their transportation need exceeded or nearly matched their access level. These areas represented the relatively underserved neighborhoods. In contrast, the blue-shaded points, located well below the reference line, reflected block groups where access substantially exceeded need.
The wide horizontal spread of access scores compared to the more compressed vertical distribution of need scores suggested that variation in infrastructure access was the primary driver of equity differences rather than variation in underlying social need.
Show code
import foliumimport branca.colormap as cmfrom branca.element import Template, MacroElementbg_proj = bg.copy()bg_proj["centroid"] = bg_proj.geometry.centroidbg_proj = bg_proj.set_geometry("centroid")bg_poly_wgs = bg.to_crs(epsg=4326).copy()bg_centroid_wgs = bg_proj.to_crs(epsg=4326).copy()bg_poly_wgs["equity_gap"] = bg["equity_gap"]bg_poly_wgs["equity_gap_q"] = bg["equity_gap_q"].astype(str)center_lat = bg_centroid_wgs.geometry.y.mean()center_lon = bg_centroid_wgs.geometry.x.mean()m_gap = folium.Map( location=[center_lat, center_lon], zoom_start=11, tiles="CartoDB Positron")gap_min = bg_poly_wgs["equity_gap"].min()gap_max = bg_poly_wgs["equity_gap"].max()orig_colors = cm.linear.RdYlBu_11.colorsrev_colors =list(reversed(orig_colors))gap_cmap = cm.LinearColormap(rev_colors, vmin=gap_min, vmax=gap_max)def style_gap(feature): v = feature["properties"]["equity_gap"]return {"fillColor": gap_cmap(v) if v isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.85, }folium.GeoJson( bg_poly_wgs, style_function=style_gap, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "need_score", "access_score", "equity_gap", "equity_gap_q"], aliases=["GEOID", "Need", "Access", "Equity gap", "Gap quartile"], localize=True )).add_to(m_gap)gap_cmap.caption ="Equity gap (need minus access)"gap_cmap.add_to(m_gap)legend_html ="""{% macro html(this, kwargs) %}<div style=" position: fixed; bottom: 20px; left: 20px; z-index: 9999; background-color: white; padding: 12px 14px; border: 1px solid #ccc; border-radius: 6px; box-shadow: 0 0 10px rgba(0,0,0,0.15); font-size: 13px; line-height: 1.4;"><b>Equity Gap</b><br>Computed as <b>need score minus<br>access score</b> for each block group.<br><br>Warmer (red) areas indicate<br><b>higher equity gaps</b> where need<br>is high but access is low.<br>Cooler (blue) areas indicate<br><b>lower equity gaps</b>.</div>{% endmacro %}"""legend = MacroElement()legend._template = Template(legend_html)m_gap.get_root().add_child(legend)m_gap
Make this Notebook Trusted to load map: File -> Trust Notebook
For the final results, we used Folium to build interactive web maps that visualize need, access, and equity gaps at the block group level. All geometries were reprojected to WGS84 (EPSG:4326) for compatibility with web mapping basemaps.
We first created a single-layer equity gap map in which each block group was colored according to its quartile category using a custom diverging color scheme. GeoJSON tooltips displayed each block group’s GEOID, raw equity gap value, and assigned quartile. An accompanying HTML legend clarified the meaning of the color categories.
The equity gap map highlights where social need exceeds access to bicycle infrastructure. Warmer colors (orange to red) represent block groups with higher equity gap values, meaning these neighborhoods have relatively higher transportation need but lower access to bike lanes or Indego stations. These areas appear most prominently in Northeast and North Philadelphia, as well as parts of Upper Northwest.
Cooler colors (light to dark blue) indicate areas where access is comparatively high relative to measured need. These clusters are concentrated around Center City and University City, where bike lanes and Indego stations are densest.
The strong spatial contrast between central Philadelphia and the outlying neighborhoods suggests that current bicycle infrastructure investments are unevenly distributed, with underserved areas located primarily farther from the city’s existing bike network.
5. Interactive Equity Mapping with Folium
Show code
import foliumimport branca.colormap as cmimport geopandas as gpdstations_2272 = gpd.read_file(PROCESSED_DIR /"stations_clean_2272.gpkg", layer="stations")stations_wgs = stations_2272.to_crs(epsg=4326).copy()bg_wgs = bg.to_crs(epsg=4326).copy()center_lat = bg_wgs.geometry.centroid.y.mean()center_lon = bg_wgs.geometry.centroid.x.mean()m_all = folium.Map( location=[center_lat, center_lon], zoom_start=11, tiles="CartoDB Positron")need_min = bg_wgs["need_score"].min()need_max = bg_wgs["need_score"].max()need_cmap = cm.LinearColormap( cm.linear.Purples_09.colors, vmin=need_min, vmax=need_max)need_fg = folium.FeatureGroup(name="Need Score", show=False)def style_need(feature): val = feature["properties"]["need_score"]return {"fillColor": need_cmap(val) if val isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.85, }folium.GeoJson( bg_wgs, style_function=style_need, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "need_score"], aliases=["GEOID", "Need score"], localize=True )).add_to(need_fg)need_fg.add_to(m_all)access_min = bg_wgs["access_score"].min()access_max = bg_wgs["access_score"].max()access_cmap = cm.LinearColormap( cm.linear.Blues_09.colors, vmin=access_min, vmax=access_max)access_fg = folium.FeatureGroup(name="Access Score", show=False)def style_access(feature): val = feature["properties"]["access_score"]return {"fillColor": access_cmap(val) if val isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.85, }folium.GeoJson( bg_wgs, style_function=style_access, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "access_score"], aliases=["GEOID", "Access score"], localize=True )).add_to(access_fg)access_fg.add_to(m_all)gap_min = bg_wgs["equity_gap"].min()gap_max = bg_wgs["equity_gap"].max()gap_cmap = cm.LinearColormap(list(reversed(cm.linear.RdYlBu_11.colors)), vmin=gap_min, vmax=gap_max)gap_fg = folium.FeatureGroup(name="Equity Gap (Need minus Access)", show=True)def style_gap(feature): val = feature["properties"]["equity_gap"]return {"fillColor": gap_cmap(val) if val isnotNoneelse"#cccccc","color": "#ffffff","weight": 0.3,"fillOpacity": 0.85, }folium.GeoJson( bg_wgs, style_function=style_gap, tooltip=folium.GeoJsonTooltip( fields=["GEOID", "need_score", "access_score", "equity_gap"], aliases=["GEOID", "Need", "Access", "Equity gap"], localize=True )).add_to(gap_fg)gap_fg.add_to(m_all)gap_cmap.caption ="Equity Gap (Need minus Access)"gap_cmap.add_to(m_all)stations_fg = folium.FeatureGroup(name="Indego Stations", show=True)for _, row in stations_wgs.iterrows(): folium.CircleMarker( location=[row.geometry.y, row.geometry.x], radius=3, color="#004c6d", fill=True, fill_opacity=0.9, popup=row.get("name", "Indego station") ).add_to(stations_fg)stations_fg.add_to(m_all)folium.LayerControl(collapsed=False).add_to(m_all)legend_html ="""<div style=" position: fixed; bottom: 20px; left: 20px; z-index: 9999; background-color: white; padding: 10px 12px; border: 1px solid #ccc; border-radius: 4px; box-shadow: 0 1px 4px rgba(0,0,0,0.2); font-size: 12px;"><b>Layer guide</b><br><span style="color:#6b4c9a;">■</span> Need score – darker purple means higher need<br><span style="color:#2166ac;">■</span> Access score – darker blue means higher access<br><span style="color:#b2182b;">■</span> Equity gap – red means higher gap where need is high and access is low<br><span style="color:#2166ac;">■</span> Equity gap – blue means lower gap where access is closer to or above need</div>"""m_all.get_root().html.add_child(folium.Element(legend_html))m_all
Make this Notebook Trusted to load map: File -> Trust Notebook